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1 Abstract and Context 

In Yukon, a new scalar data type for XML data is being introduced. A relational table can have 
one or more XML columns that may be untyped or typed according to a specified XML schema. 
The XML values populating these columns are stored as blobs (varbinary (max)) for easier 
retrieval. For many purposes, the XML data type column behaves like an varbinary (max) 
column 1 . 

Queries are slow since the XML blob must be parsed each time a query is executed on it If the 
result of parsing is saved / then queries can use it and run significantly faster. This necessitates 
shredding; the XML data is into its basic components (XML nodes) and storing those in a separate 
XML index. 

From the user's perspective, only the XML columns are visible, and the user creates the XML 
index on the XML column. This is not an ordinary index on a table — behind the scene, the 
engine creates structures that for convenience are called the node table in this document. The 
engine populates the node table from the XML blobs stored in the XML column in the spirit of a 
regular index. The engine components manage the correlation between the XML column and the 
underlying node table transparently, so that users continue to submit their queries on the XML 
column (instead of the node table). To prevent inadvertent use of the shredded XML form, users 
cannot bind to the node table directly for querying or manipulating the data stored therein. 

In terms of implementation, the node table is created within the XML index creation DDL as an 
indexed view. Indexes on the node table are created by default. 



1 The CTypelnfo for the column has the value XVT_XML 
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2.1 Node Table 

Storage for XML columns X and Y in a user table R and its underlying node tables is as follows: 
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In the node table, XID is an identifier {OrdPatii) for the XML node represented by the row. It 
captures hierarchical relationship and the order among the nodes in the XML data (called 
document order). The XID of the node's parent is stored in the computed PID column. PID is a 
prefix of the node's XID and is an OrdPath itself. 

NID is a token value for the XML node name, while TID is a token value for the type of the XML 
node; The VALUE column stores the value, if one exists, of the node up to a pre-determined 
length (128 bytes). Longer values are stored in the LVALUE column with the first 128 bytes 
stored in the VALUE column as well. 
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The clustered, primary key of the node table is composed of the primary key of the user table and 
the XID column; hence, the primary key of the user table cannot contain more than 15 columns. If 
the clustered, primary key in the user table contains 16 columns, then no XML column can be 
added to the table. 



2.2 Indexes on Node Table 

Indexes on the node table are the following: 

- Clustered index on (PK, XID), which causes clustering in depth-first order of the XML 
hierarchy 

- NAME INDEX: Secondary index on (NID, TID, P1D, PK, and XID). 

- VALUE INDEX: Secondary index on (VALUE, NID, TID, PK, and XID). 

The NAME index is created automatically since it is useful for almost all queries on the XML 
data. An example is /Customer/ Order, in which this index is looked up with the token values 
^Customer and #Order to determine the corresponding XML nodes. 

The VALUE index is useful for value-based queries, and is created at the same time as the node 
table. Optional value indexes will be considered for future releases. 



3 Description 

3.1 XML Index Syntax 

The syntax for XML indexes is the standard one for creating an index except that only a single, 
XML column is involved: 

CREATE XML INDEX index^name ON Cable (Column) 
[WITH (<index_option> (, ...n ]} ] 

<index_option> : : ■» 

PAD_ INDEX » {ON | OFF ) 
FILL FACTOR ■» fil If actor 
SORT INJTEMPDB « {ON | OFF ) 
STATISTICS_NORECOMPUTE = {ON | OFF ) 
DROPJ5XISTING - {ON | OFF) 
ALLOW JROWJJOCKS- {ON | OFF) 
ALLOW_P AGE LO CKS = { ON j OFF) 
MAXDOP»number_of _proce Bsora 

This creates the necessary node table structure and secondary indexes on it, and populates those 
structures from the values in the XML column. All indexing options are available, except 
lGNORE_DUP„KEY and ONLINE, which are always OFF. For example, if the DROPJSXISTING 
option is specified, then the existing XML index is dropped and a new one created in its place. 

The default values for the options are as follows: 
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PAD_ INDEX = OPF 
FILLPACTOR b 0 
SORT_IN_TEMPDB = OFF 
STATISTICS_NORECOMPUTE w OFF 
DROP^EXISTING ■ OFF 
ALLOW_ROW_LOCKS= ON 
ALIX)W~PAGE_LOCKS= ON 
MAXDOP - 0~ 



An XML index creation is permitted if the user table contains a clustered, primary key. 
Otherwise, XML index creation fails. If an XML index exists on any XML column in a table, the 
clustered primary key of the table cannot be modified. Users have to drop all XML indexes on the 
table before modifying the clustered, primary key. 

























. ..1 .. -diM*.lz —J 



Structures necessary to support the indexing behavior (e.g. indexed view) is created as part of 
this DDL. These structures are populated with the values from the XML column. Updates 
occurring on the XML column are propagated to these XML index structures as well. Since XML 
data type instances are incomparable, no statistics can be created on an XML column. 



As a design philosophy, actions on the XML index should be propagated to the underlying 
storage (e.g. indexed view) for user-friendly behavior. However, it is not always desirable or 
feasible to do so. The following subsections provide the feature description of XML index. 



", ' ^| "I 
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3.2 Optional XML Indexes 

Customers predominantly want the ability to specify the parts of XML instances that should be 
indexed. That is, they want greater control over the granularity of to be indexed. This saves space 
compared to indexing the full XML instances. Consequently, insertions and updates to the XML 
column are faster. 

We can provide more control in the following ways: 

1. Allow users to create the index on tags. Only the tag or the subtree under the tag (partial 
slmdding) may be indexed. By default, the XML index then creates the node table only. In 
a variation, the sub-tree can be stored as a blob for easier retrieval. 

2. Allow users to create an index on values for value-based searches such as // section [@* ■ 

3. Allow users to specify the paths on which tags and values are to be indexed. There paths 
may be relative or absolute. Alternatively, we can allow users to specify the paths which 
should be excluded from indexing. 

4. Allow users to index specified paths. 

5. Allow full- text index to be built on the XML index. 

6. Allow users to index children of nodes. This is useful for navigational access. 

These options - except partial shredding and possibly full-text index on XML index — check for 
satisfiability of the search condition on the instances in the XML column. The subtrees under the 
nodes that satisfy the search condition are not retrieved. Entire XML instances may be returned. 
To extract the subtrees from under the nodes satisfying the search condition, an outer query must 
be executed to do so. 

Partial shredding can return the subtree but serializing it out from the index. 




3.4 Drop XML Index 

The syntax for dropping an XML index is the regular DROP INDEX command except that an 
XML index is indicated and an XML index name is specified. 
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DROP INDEX index _hame ON table [ , . . .n J 

This results in dropping the XML index from meta-data and all associated storage. For example, 
if the underlying implementation is indexed view, then the indexed view and its indexes are 
dropped as well. 

Whenever an XML index is dropped, the QNAME entries occurring in the XML index storage are 
deleted from meta-data storage. 

The DROP INDEX statement does not support the deprecated syntax (DROP INDEX 
TableJndex). 



3.5 Drop XML Column and Drop Table Containing XML Columns 

An XML column is dropped using the regular ALTER TABLE command. This causes the column 
to be deleted from the table following the usual engine logic. If an XML index exists on the 
column, then the column cannot be dropped but returns an error. This behavior is consistent with 
columns of hon-XML types. 

When a table containing one or more XML columns is dropped, all XML indexes defined on 
those XML columns are dropped as well. This causes the supporting structures of the XML 
indexes to be dropped as well. 



3.6 Altering XML indexes 

Altering an XML index is allowed for modifying B-tree options (e.g. re-building the indexes). For 
XML index/the appropriate indication is supplied in the DDL below; 

ALTER UNDEX {index_name \ ALL) ON {table | view} 

{ REBUILD [WITH ( <rebuild_index_option> { , . . . nl ) ] 
I DISABLE 

1 REORGANIZE [ WITH (I^B_COM PACTIONS ON | OFF) ) ] 

I SET {<set_index_option> [ , ...n] ) 

) 

<rebuild_index_pption a > : : = 
PAD_ INDEX » {ON | OFF ] 
| FILL FACTOR «■ fillfactor 
| SORT_lN_TEMPDB • * {ON | OFF } 
j STATISTICS_NORECOMPUTE = {ON | OFF ) 
j ALLOW_ROW__LOCKS« {ON | OFF) 
j ALLOW PAGE LQ C KS - ( ON j OFF) 

I MAXDOP=number_of_j)rocessors 

<set_index_option> : : = 

ALL0W_R0W_L0CKS= {ON j OFF) 



* DROP^EXJSTING is not allowed in the ALTER INDEX syntax. 
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| ALLOW_PAGE_LOCKS= { ON | OFF} 

j STATISTICSJSORECOMPUTE - {ON | OPP} 

In Yukon, an XML index can be created on an XML column in a table but not in a view. Hence, a 

view name specified for an XML index in the ALTER INDEX statement results in an error. 

i 

The rebuild options ONLINE and IGNORE JDUPJCEY, and the set option IGNOREJXJPJKEY 
are not valid for XML index . 

Modification of the PK constraint in the user table is not automatically propagated to XML 
indexes; rather, the burden is on the user to drop all XML indexes on the table and to re-create 
them. If the user attempts to change the PK constraint while XML indexes exist on the table, a 
meta-data error is generated. 

Whenever altering an XML index causes re-building the index, QNAME cleanup occurs in meta- 
data. The QNAME entries in meta-data that no longer occur in the XML index structures are 
removed from meta-data storage. 

The DISABLE option is supported on an XML index. If this option is set, the index is no longer 
used in query plans and not maintained. 

If the option ALL is specified, it applies to both non-XML and XML indexes. Other options may 
be specified that are not valid for both types of indexes (e.g. ONUNE*ON option for an XML 
index, or IGNORE^DUP^KEY-ON for a non-unique index or an XML index). In such cases, the 
entire statement fails. 



3.7 Miscellaneous Operations on XML Index 

Although an XML index is created in the same namespace as non-XML indexes, some operations 
on XML indexes may not work. Examples of such operations are: 

• Sp_helplndex 

Such stored procedures are being deprecated. Instead, the XML index information will be 
found in catalog views. 

Meta-data functions - INDEX^COL, INDEXKEY_PROPERTY - which accept index id as 
arguments will work for XML index. 

DBCC CHECKTABLE (table, index-id) will look for a regular and XML index on the table and 
check the XML index as a regular table with non-clustered indexes. 



3.8 Transaction Isolation Levels 

All transaction isolations all supported on XML data type. The behavior is similar to other first- 
class relational data types. 




- : REDACTED; 
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4 index Behavior in other cases 



4.1 Replication 

If the XML index is created on an XML column, the index is automatically replicated. This creates 
the underlying storage (Le. indexed view) at the other end of replication. The XML blobs are 
written into the log for transactional replication, just like text columns. Thus, text and XML 
columns are treated the same way for replication. 

Data in the underlying storage for XML index (i.e. the indexed view) are not replicated. Rather, at 
the target end, the XML blobs are pieced together from the log the same way as text data, and 
assigned to the corresponding XML column. The XML index mechanism ensures that the update 
is propagated to the underlying storage (indexed view). 



If the XML data type column is constrained by a schema, the constraining XML schema must be 
available at the subscriber for replication of the XML column. 



REDACTED ' : \ t&lWMB^ 



If XML index is implemented as indexed view, replication of the indexed view is not allowed 
directly but results in an error. Allowing it yields significant complexity without adding much 
value. 



4.2 DBCC 

DBCC commands involving an XML index are propagated to the underlying storage (indexed 
view). This preserves users' perception of an XML index being analogous to a secondary index on 
any other SQL type. 



Currently, DBCC commands on a table T do not perform the same checks on an indexed view V 
on T; the calls must be made separately on V. However, propagating the DBCC checks to indexed 
views requires SE and QP enhancements. 



DBCC Function 


Action on XML Blob column 


CHECKCATALOG 

CHECKCONSTRAINTS 
CHECK DB 

'CHECKFI LEGROUP 
CHECKTABLE 


This is a meta-data consistency check. There is no work for XML 

index created as indexed view 

Performed by existing DBCC functionality 

No logical XML checks are performed on the XML blob. This 

assumes that physical checks are adequate to ensure the indexed 

view is in sync with the XML blobs. If there is a discrepancy, the 

XML index can be re-built. 

This DBCC function behaves the same way as CHECKDB but is 
scoped to the specified hie group. 

1. Physical checks are assumed to be adequate, and no well- 
formedness or validation of XML blobs is needed. 

2. CHECKTABLE is not propagated to the underlying storage 
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DBREINDEX (ALTER 
XML INDEX ... 
REBUILD) 

INDEXDEFRAG (ALTER 
XML INDEX ... 
REORGANIZE) 
SHOW^STATISTICS 



SHOWCONTIG 
UPDATEUSAGE 



(indexed view) whether or not NOINDEX is specified. 
Instead, users must directly make the DBCC call on the 
indexed view (which should be allowed although the index 
view is non-bindable) and provides the current behavior. 
3. If REBUILD option is specified, then the XML index is 
regenerated. This results in QNAME cleanup as well. 

1. The XML index is re-built. This requires regeneration of the 
underlying storage. 

2. QNAME entries are cleaned up. 
Similar issues as DBREINDEX 



1 . Statistics on an XML column is not allowed, so that this 
command returns an error. 

2. Statistics cannot be created/ updated and viewed on columns 
in the underlying storage (indexed view). For indexed view 
implementation, the indexed view is non-bindable 5 . 

Same issues as SHOW ^STATISTICS 
Same issues as SHOW_STATISTICS 



The actions required for DBCC are dependencies on the SE team. 



4.3 BCP 

BCP out of the user table copies the XML blobs; neither the XML index definition nor the content 
of the underlying storage is output. Users cannot BCP out the underlying indexed view since it is 
non-bindable. 

The XML blob can be loaded into a similar column at the destination database. If an XML index 
definition exists at the target then it is populated. Users may expect this behavior since indexes 
on non-XML columns are handled the same way. 

BCP in requires well-formedness check for XML as well as validation if an XML schema is 
attached to the target XML column. 

If the XML data type column is constrained by a schema, the constraining XML schema must be 
available at the target for BCP in to succeed. 

4.4 Partitioning 

XML data type instances are internally stored as LOB in the user table. The partitioning or 
filegroup specification on the user table applies to it: For example, if LOBs are to be stored in a 
different partition or filegroup than the rest of the user table, then XML blobs will follow the 
same pattern. 

The XML index (internal table) is in the same partition or filegroup as the user table it is defined 
in (and not collocated with the LOB partition/ filegroup in the user table if the LOB 



3 "Non-bindable" means that a user cannot bind to the content or the schema of the indexed view using T- 
SQL or DDL. 
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partition/filegroup is different from that of the user table). This yields faster query performance 
at the expense of data modification speed. 

This requires that 

• The partitioning scheme defined on the user table is applied to the underlying storage for 
XML index. 

• The clustered, primary key is copied from the user table into the underlying storage for 
XML index. While this can be relaxed a little, this restriction turns out to be the most 
straightforward and robust one. 

When the partitioning scheme is modified (e.g. to specify NEXTJUSED), the modification is 
propagated to the underlying storage. 

Non-clustered indexes on the internal table of the XML index are in the same partition/filegroup 
as the internal table. 

When the partitioning scheme of the user table is changed to a new one, if the partitioning 
columns are different, the XML index must be re-generated to include the partitioning columns. 
Hence, such partitioning scheme changes result in error as long as one or more XML indexes exist 
on the table. The user must explicitly (using DROP INDEX statement) drop all XML indexes on 
the table, change the partitioning schema, and re-build the indexes. 

No XML indexes can exist on the user table when the clustered index on the user table is re- 
created with DROP_EXISTiNG » ON and a new partitioning scheme. On an XML index exist, 
then the re-creation fails. 



5 System-defined Table-valued Function 

The table-valued function is created as follows: 



CREATE FUNCTION SHRBDJCML ( 

©xrnlcol XML) 
RETURNS ©ShreddedRows TABLE ( 



BEGIN 



XID 

MID 

TID 

VALUE 

LVALUE 

HID 

) AS 

DECLARE 



Or d Path, 
int, 
int , 

sql_variant t 
nvar char (max) , 
int 



XML column on which to execute 

node id 
- - the name ID value 
the type ID value 
(small) value column 
large value column 
tokenized path 



©Temp TABLE ( 

XID OrdPath, NID int, TID int, VALUE sq Invariant, 
LVALUE nvar char (max) , . . . ) 



INSERT 
SELECT 
FROM 



INTO ©Temp 

RowBet~generated-by~8hredding-XHL-BL0B-®xmlcol-with- 
path- expression -and -max- level- applied 



INSERT 



INTO ©ShreddedRows 
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SELECT XID, NID, TID, VALUE, LVALUE, HID FROM OTemp 

RETURN 

END 

The XML column is passed in as an argument as required. 



. * " - " 'REDACTED - / , ; ^ 



6 Implementation of XML Index 

When a user issues the CREATE INDEX statement on an XML data type column, a meta-data 
object is created for the index. In addition, auxiliary structures are created automatically within 
the same DDL (i.e. not explicitly by the user) to implement the node table as an indexed view. 

The following subsections describe the behind-the-scenes activities that go on to implement the 
| node table. The XML subsystem uses the table-valued function.of Section J to create the indexed (PetetediT 
view. 



6.1 Indexed View Implementation 

6.1.1 View Definition 

The indexed view is created as 

CREATE VIEW index- name AS 

SELECT Ti.CK, NT. * 

FROM table T 

APPLY SHRED JCML (X) NT 

Where CK is the non-null clustering key in the user table on which the view is being created. 
When XML values in the user table are updated, the changes are propagated to the indexed view. 
In principle, only the incremental difference needs to be propagated. However, owing to the 
occurrence of the table-valued function, the computation of the difference is hard. Instead, the 
rows in the node table corresponding to the old XML values are deleted, and new ones for the 
updated XML values are inserted. This generates a significant number of log records for the node 
table. 



The indexed view is non-upda table but visible to the user in the catalog. To avoid any user 
dependency on its contents, the view is made non-bindable in meta-data, so that user queries 
against the indexed view are failed by.QP. 
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Since the indexed view is visible, users can freely create (CREATE STATISTICS), view (DBCC 
SHOVSLSTATISTICS) and update statistics (UPDATE STATISTICS) on the individual columns of 
the indexed view. 
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6,1.2 Clustered Index Definition 

Once this view has been created, the first clustered index on it is defined as 



CREATE UNIQUE CLUSTERED INDEX internal- index-name 
ON index-name (CK, XID) 
[WITH <index_option> [ f ...n ] ] 

(ON {partition_scheme_name (column^naxne [1, ...n}) 



: . • • 



filegroup }) 



< 



6.1.3 Secondary Index Definitions 

Thereafter, the secondary indexes are created. 



Name index 


CREATE INDEX internal- index - name 
ON index-najne (NID, TID) 


Value index 


CREATE INDEX internal- index-name 

ON index-name (VALUE, NID, TID) 
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6,1.4 Modifying XML Instances 

In principle, the delta change can be propagated by the QP to the indexed view. This, however, is 
quite difficult to achieve since a table-valued function is involved in the view definition. 

A much simpler strategy — the one adopted for the Yukon release — is for QP to replace the 
existing XML instance with the new one. This causes the corresponding, old rows in the node 
table to be dropped and the new ones inserted. This approach suffers from the following 
drawbacks: 

- A log record is generated for each row deleted from and inserted into the indexed view. 

- Both the old and the new states of the XML blob are logged. That is, the log records 
contain more than just the difference between the two values. Consequently, the log is 
bigger. This impacts scenarios such as replication. 



6.1.5 Using SET Options 

Indexed view requires SQL options for consistent results. These are shown in the table below. 



SET 


Required 


Default 


OLE DB & 


DB LIB 


HTTP 


Options 


Value for 


Server 


ODBC 


Value 


Value 


Indexed View 


Value 


Value 






ANSLNULLS 


ON 


OFF 


ON 


OFF 


ON 


ANSI PADDING 


ON 


ON 


ON 


OFF 


ON 


ANSI_ WARNING 


ON 


OFF 


ON 


OFF 


ON 


ARITHABORT 


ON 


OFF 


OFF 


OFF 


ON 


CONC AT_N ULL_YIELDS 


ON 


OFF 


ON 


OFF 


ON 


JNULL 












NUMERICJROUNDABORT 


OFF 


OFF 


OFF 


OFF 


OFF 


QUOTEDJDENT1FIER 


ON 


OFF 


ON 


OFF 


ON 



The SQL options must be set to the values shown in the Required Value column whenever these 
conditions occur: 



- The XML index is created, since an underlying indexed view may be created. - 

- There is anJNS^ 

participating in an XML index, or an up date of the PK column values in the user table . 
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